Classwise Concept with Examples
6th	7th	8th	9th	10th	11th	12th

Class 9th Chapters
1. Number Systems	2. Polynomials	3. Coordinate Geometry
4. Linear Equations In Two Variables	5. Introduction To Euclid’s Geometry	6. Lines And Angles
7. Triangles	8. Quadrilaterals	9. Areas Of Parallelograms And Triangles
10. Circles	11. Constructions	12. Heron’s Formula
13. Surface Areas And Volumes	14. Statistics	15. Probability

Content On This Page
Basic Terms and Features Related to Statistics	Ungrouped Frequency Distribution Table	Grouped Frequency Distribution Table
Cumulative Distribution Table	Graphic Representation of Data - Bar Graph, Histogram & Frequency Polygon	Measures of Central Tendency of Ungrouped Data
Relationsship Between Mean, Median and Mode	Problems on Arithmetic Mean

Chapter 14 Statistics (Concepts)

Welcome to this comprehensive exploration of Statistics, a vital branch of mathematics concerned with the collection, organization, analysis, interpretation, and presentation of data. Building substantially upon the introductory concepts from Class 8, this chapter delves deeper into more sophisticated methods for handling information, enabling us to extract meaningful insights and draw representative conclusions from datasets. We will refine our techniques for managing raw information and introduce powerful numerical summaries that capture the essence of the data.

Our journey begins by revisiting the fundamental processes of data collection, distinguishing between primary data (collected firsthand) and secondary data (obtained from existing sources). We emphasize the importance of moving from raw, often chaotic, data presentation to structured formats. Key organizational tools like frequency distribution tables are revisited and refined. For large datasets, constructing grouped frequency distribution tables becomes essential. This involves defining appropriate class intervals (like $0-10$, $10-20$), calculating the class size or width ($h$), determining the class marks ($x_i$, the midpoint of each interval, calculated as $\frac{\text{Upper limit + Lower limit}}{2}$), and finding the overall range of the data (Maximum value - Minimum value). Tally marks (e.g., $||||$ for 4, $\bcancel{||||}$ for 5) are often used in the process of counting frequencies ($f_i$) for each class.

Graphical representation techniques become more nuanced. While bar graphs serve well for discrete data, our focus shifts towards visualizing continuous grouped data:

Histograms: These are the go-to representation for grouped frequency distributions. Unlike bar graphs, the bars in a histogram are drawn adjacent to each other (no gaps, signifying continuity), with the width representing the class interval and the height representing the frequency. We explore constructing histograms with uniform class widths and also introduce the more complex case of histograms with varying widths. In the latter, the heights of the rectangles must be adjusted proportionally (using frequency density = $\frac{\text{Frequency}}{\text{Class Width}}$) so that the area of each rectangle, not just the height, is proportional to the frequency it represents.
Frequency Polygons: Offering a different visual perspective, a frequency polygon is a line graph formed by joining the midpoints of the tops of the bars in a histogram. Alternatively, it can be constructed independently by plotting points whose x-coordinates are the class marks ($x_i$) and whose y-coordinates are the corresponding frequencies ($f_i$), and then joining these points with line segments. For closure, the polygon is typically extended to meet the x-axis at the class marks of hypothetical intervals preceding the first and succeeding the last actual interval.

Beyond visualization, we introduce crucial numerical summaries known as Measures of Central Tendency. These provide a single value that attempts to describe the 'center' or typical value of a dataset:

Mean ($\bar{x}$): The arithmetic average.
- For ungrouped data: $\bar{x} = \frac{\text{Sum of all observations}}{\text{Number of observations}} = \frac{\sum\limits x_i}{n}$.
- For grouped data (Direct Method): $\bar{x} = \frac{\sum\limits_{i} f_i x_i}{\sum\limits_{i} f_i}$, where $f_i$ are frequencies and $x_i$ are class marks. (Mention of assumed mean/step-deviation methods as potential calculation aids for large datasets).
Median: The value of the middlemost observation when the data is arranged in ascending or descending order.
- For ungrouped data: If the number of observations ($n$) is odd, Median = Value of the $(\frac{n+1}{2})^{th}$ observation. If $n$ is even, Median = Average of the values of the $(\frac{n}{2})^{th}$ and $(\frac{n}{2} + 1)^{th}$ observations.
- For grouped data: Identifying the median class and applying a specific formula (often deferred to Class 10) involving its lower limit, cumulative frequency, frequency, and class size.
Mode: The value which occurs most frequently in the dataset.
- For ungrouped data: Identified by simple inspection.
- For grouped data: Involves identifying the modal class (the class interval with the highest frequency) and potentially using a formula (often detailed in Class 10) involving its limits and the frequencies of neighboring classes.

Understanding when to use each measure is also important; for instance, the median is often preferred over the mean when the data contains extreme values (outliers) because it is less sensitive to them.

Basic Terms and Concepts Related to Statistics

In today's information age, we are constantly exposed to numerical facts and figures from various sources like news reports, surveys, academic studies, and government publications. This information could pertain to diverse topics such as cricket scores, economic growth rates, weather patterns, or student performance.

These numerical facts or figures collected for a specific purpose are collectively known as data. The word 'data' originates from the Latin word 'datum', meaning 'something given'.

The term 'Statistics' is derived from the Latin word 'status' or the Italian word 'statista', both of which relate to a 'political state'. Historically, statistics was primarily used by states to collect data relevant to governance, such as population counts or economic indicators. However, the field of Statistics has evolved significantly and now encompasses a much broader scope.

Statistics, as a subject, is a branch of mathematics that provides methods and tools for dealing with data. It involves a systematic process that includes:

Collection of data: Gathering the numerical information from relevant sources.
Organisation of data: Arranging the collected raw data in a meaningful and manageable form.
Presentation of data: Displaying the organised data visually or in tables to make it easy to understand.
Analysis of data: Examining the data to find patterns, relationships, and summaries.
Interpretation of data: Drawing conclusions and making inferences based on the analysis.

Statistics helps us to summarise large datasets, identify trends, make comparisons, and ultimately use data to support decisions and understand phenomena better.

Key Terms in Statistics

Here are some basic terms commonly used in the study of Statistics:

Data:

Data refers to facts, figures, or any other type of information that is collected for a specific purpose. When data is collected in its original form, it is called raw data.

Example: The heights of all students in Class 9 of a school, recorded as they are measured, constitute raw data.

Observation:

Each individual value or entry in a dataset is called an observation.

Example: In the list of heights of students, each student's height is an observation.

Variable / Variate:

A variable or variate is a characteristic, quantity, or attribute that is being studied and that can take on different values for different individuals or objects in a dataset. It's what we are measuring or observing.

Example: In collecting data about the marks obtained by students in a test, 'marks obtained' is the variable.

Frequency:

The frequency of a particular observation (or a group of observations) is the number of times that observation (or values within that group) occurs in a dataset. It tells us how often a specific value appears.

Example: If the mark 75 appears 5 times in a list of test scores, then the frequency of the observation '75' is 5.

Organisation of Data:

Raw data, especially when collected in large quantities, can be messy and difficult to comprehend. Organisation of data is the process of arranging this raw data in a systematic and meaningful way to facilitate analysis and interpretation. Common methods include arranging data in ascending or descending order (arraying) or creating frequency distribution tables.

Range of Data:

The range of a dataset is a simple measure of the spread or dispersion of the data. It is calculated as the difference between the highest (maximum) observation and the lowest (minimum) observation in the dataset.

Range = Highest Observation $-$ Lowest Observation

Types of Data:

Based on the method of collection, data can be broadly classified into two types:

Primary Data: This type of data is collected directly by the investigator or researcher for their specific purpose. The investigator is the first person to gather this information. Example: Conducting a direct survey among a group of people about their voting preferences.
Secondary Data: This type of data is collected by someone else (e.g., a government agency, another researcher, a non-profit organisation) and has already been processed or published. The investigator uses this existing data for their study. Example: Using census data published by the government to study population density in different areas.

Choosing between primary and secondary data depends on the research question, available resources, and time constraints.

Example 1. The marks obtained by 10 students in a test are: 55, 36, 95, 73, 60, 42, 25, 78, 75, 62. Identify the raw data, the variable, and find the range of the data.

Answer:

Given:

Marks obtained by 10 students: 55, 36, 95, 73, 60, 42, 25, 78, 75, 62.

To Identify/Find:

Raw data, variable, and range.

Solution:

The given list of marks in its original form is the raw data.

Raw data: 55, 36, 95, 73, 60, 42, 25, 78, 75, 62.

The characteristic that is being measured for each student, and which varies among the students, is the variable.

Variable: 'Marks obtained by a student'.

To find the range, we need to identify the highest (maximum) and the lowest (minimum) values in the raw data.

Looking at the data: 55, 36, 95, 73, 60, 42, 25, 78, 75, 62.

Highest observation = 95

Lowest observation = 25

The range is the difference between the highest and lowest observations.

Range = Highest Observation $-$ Lowest Observation

Substitute the identified values:

Range = $95 - 25$

Range = $70$

... (1)

The range of the marks obtained by the students is 70.

Ungrouped Frequency Distribution Table

After collecting raw data, the next step in Statistics is to organise it to make it more understandable and amenable to analysis. One simple way to organise data, especially when the number of distinct values is small or the range of data is not very large, is to create a frequency distribution table. A basic type of frequency distribution is the ungrouped frequency distribution.

An ungrouped frequency distribution table is a table that lists each distinct observation (value) that appears in the dataset and shows the number of times (frequency) each distinct observation occurs.

Steps to Create an Ungrouped Frequency Distribution Table:

Let's outline the systematic steps to construct an ungrouped frequency distribution table from raw data:

List Distinct Observations: Identify all the different, unique values present in the raw data. List these distinct observations in a column, usually arranged in either ascending (smallest to largest) or descending (largest to smallest) order. This column is typically labelled with the name of the variable being studied (e.g., 'Marks', 'Number of Children', 'Score').
Tally the Frequencies: Go through the raw data one observation at a time. For each observation, make a mark (a vertical stroke, '|') in a 'Tally Marks' column next to the corresponding distinct value in your table. To make counting easier, especially for larger datasets, group the tally marks in blocks of five. The fifth tally mark is usually drawn diagonally across the previous four vertical marks ($\bcancel{||||}$).
Count the Frequencies: After tallying all observations, count the tally marks for each distinct observation. Write this total count in a 'Frequency' column next to the tally marks.
Verify the Total: Sum up all the frequencies in the 'Frequency' column. This sum should be equal to the total number of observations in the original raw data. This step helps to check if all observations have been accounted for.

Example 1. The following data represents the number of children in 20 families: 2, 3, 1, 5, 2, 2, 3, 4, 1, 2, 5, 3, 3, 2, 1, 4, 2, 3, 1, 2. Prepare an ungrouped frequency distribution table for this data.

Answer:

Given:

The number of children in 20 families: 2, 3, 1, 5, 2, 2, 3, 4, 1, 2, 5, 3, 3, 2, 1, 4, 2, 3, 1, 2.

To Prepare:

An ungrouped frequency distribution table.

Solution:

First, let's identify the distinct observations (number of children) present in the data. The values are 1, 2, 3, 4, and 5. We will list these in ascending order.

Now, we will go through the data and use tally marks to count the frequency of each number:

For '1': Data has 1, 1, 1, 1. Occurs 4 times. Tally: $||||$
For '2': Data has 2, 2, 2, 2, 2, 2. Occurs 6 times. Tally: $\bcancel{||||} \; |$
For '3': Data has 3, 3, 3, 3, 3. Occurs 5 times. Tally: $\bcancel{||||}$
For '4': Data has 4, 4. Occurs 2 times. Tally: $||$
For '5': Data has 5, 5. Occurs 2 times. Tally: $||$

Now, we compile these counts into an ungrouped frequency distribution table:

Number of Children	Tally Marks	Frequency
1	$\|\|\|\|$	4
2	$\bcancel{\|\|\|\|} \; \|$	6
3	$\bcancel{\|\|\|\|}$	5
4	$\|\|$	2
5	$\|\|$	2
Total		20

The sum of the frequencies (4 + 6 + 5 + 2 + 2) is 20, which is equal to the total number of families (observations) given in the data. This confirms that the counting is correct.

Grouped Frequency Distribution Table

When working with a large dataset or data with a wide range of values, listing every unique observation becomes inefficient. In these cases, we group the data into intervals to create a grouped frequency distribution table.

This type of table organizes data by dividing it into ranges, called class intervals, and then showing the number of observations (frequency) that fall into each interval.

Key Terms for Grouped Data

Understanding the following terms is essential for creating and interpreting grouped frequency distributions.

Class Interval

A class interval is a specific range used to group data points. For example, 10-20, 20-30, etc.

Class Limits

Lower Class Limit: The smallest value in a class interval. For the interval 10-20, the lower limit is 10.
Upper Class Limit: The largest value in a class interval. For the interval 10-20, the upper limit is 20.

Class Size (or Class Width)

The class size is the difference between the upper class limit and the lower class limit.

Class Size = Upper Class Limit $-$ Lower Class Limit

For the class interval 10-20, the class size is $20 - 10 = 10$.

Class Mark (or Mid-point)

The class mark is the midpoint of a class interval. It is used to represent all the values within that class for certain calculations.

Class Mark $= \frac{\text{Lower Class Limit} + \text{Upper Class Limit}}{2}$

For the interval 10-20, the class mark is $\frac{10+20}{2} = 15$.

Types of Class Intervals

Exclusive (or Continuous) Form: In this form, the upper limit of one class is the same as the lower limit of the next class (e.g., 10-20, 20-30). The upper limit value is excluded from its class and included in the next one. For instance, a data point of 20 would belong to the 20-30 class, not the 10-20 class. This form is preferred for creating histograms.
Inclusive (or Discontinuous) Form: In this form, both the lower and upper limits are included in the class (e.g., 10-19, 20-29). There is a gap between the upper limit of one class and the lower limit of the next.

Converting a Discontinuous Distribution to a Continuous Distribution

To draw a histogram, the class intervals must be continuous (in exclusive form). If the data is given in an inclusive (discontinuous) form, we must convert it first.

Steps for Conversion:

Find the difference (gap) between the upper limit of a class and the lower limit of the next class.
Calculate half of this gap. This value is the adjustment factor.
Subtract the adjustment factor from all the lower class limits.
Add the adjustment factor to all the upper class limits.

Example: Convert the discontinuous intervals 1-10, 11-20, 21-30 into continuous intervals.

1. Gap = (Lower limit of 2nd class) - (Upper limit of 1st class) = $11 - 10 = 1$.

2. Adjustment factor = $\frac{\text{Gap}}{2} = \frac{1}{2} = 0.5$.

3. Subtract 0.5 from lower limits: $1-0.5 = 0.5$, $11-0.5=10.5$, $21-0.5=20.5$.

4. Add 0.5 to upper limits: $10+0.5=10.5$, $20+0.5=20.5$, $30+0.5=30.5$.

The new continuous intervals are 0.5-10.5, 10.5-20.5, 20.5-30.5.

Steps to Create a Grouped Frequency Distribution Table

Determine the Range: Find the range of the data (Highest Value - Lowest Value).
Decide Class Intervals: Choose a suitable class size and decide on the class intervals, ensuring they cover the entire range of data. Start the first interval at or just below the lowest value.
Tally Marks: Go through the data one by one and place a tally mark against the class interval where each data point falls.
Count Frequency: Count the tally marks for each interval to get the frequency.
Verify Total: Sum all the frequencies. The total should match the total number of observations.

Example 1. The following are the marks obtained by 30 students in a test: 10, 20, 36, 92, 95, 40, 50, 56, 60, 70, 92, 88, 80, 70, 72, 70, 36, 40, 36, 40, 92, 40, 50, 50, 56, 60, 70, 60, 60, 88. Construct a grouped frequency distribution table for this data with class intervals of size 10.

Answer:

Given:

Marks of 30 students.

10	20	36	92	95	40	50	56	60	70
92	88	80	70	72	70	36	40	36	40
92	40	50	50	56	60	70	60	60	88

Total number of students = 30.

Required class size = 10.

To Construct:

A grouped frequency distribution table.

Solution:

Step 1: Find the range. The lowest mark is 10 and the highest mark is 95.

Step 2: Create class intervals of size 10. We will use the exclusive form, starting from 10. The intervals will be 10-20, 20-30, ..., up to 90-100 to include the highest value, 95.

Step 3 & 4: Create columns for class intervals, tally marks, and frequency. Go through the data and place tally marks.

Class Interval (Marks)	Tally Marks	Frequency (Number of Students)
10-20	$\|$	1
20-30	$\|$	1
30-40	$\|\|\|$	3
40-50	$\|\|\|\|$	4
50-60	$\bcancel{\|\|\|\|}$	5
60-70	$\|\|\|\|$	4
70-80	$\bcancel{\|\|\|\|}$	5
80-90	$\|\|\|$	3
90-100	$\|\|\|\|$	4
Total		30

Step 5: Verify the total. The sum of frequencies is $1+1+3+4+5+4+5+3+4 = 30$, which matches the total number of students. The table is correct.

Cumulative Frequency Distribution

A cumulative frequency distribution shows a running total of the frequencies. It helps us quickly see how many observations fall below or above a certain value or class interval.

Cumulative frequency for any class is calculated by summing its frequency with the frequencies of all the classes that came before it.

There are two types of cumulative frequency distributions:

'Less Than' Type: This shows the number of observations that are less than the upper limit of a particular class.
'More Than' Type: This shows the number of observations that are greater than or equal to the lower limit of a particular class.

Creating a 'Less Than' Cumulative Frequency Table

Start with the first class. Its cumulative frequency is just its own frequency.
For the next class, add its frequency to the cumulative frequency of the class before it.
Continue this process of adding the current class frequency to the previous cumulative frequency until the last class is reached.
The cumulative frequency of the last class will be equal to the total number of observations ($N$).

Creating a 'More Than' Cumulative Frequency Table

Start with the total number of observations ($N$). This is the cumulative frequency for 'more than or equal to' the lower limit of the very first class.
For the next class, subtract the frequency of the first class from the total ($N$).
Continue this process of subtracting the frequency of the preceding class from the preceding cumulative frequency.
The cumulative frequency of the last class will be equal to its own frequency.

Example 1. Construct the 'less than' and 'more than' cumulative frequency tables for the following grouped frequency distribution.

Class Interval	Frequency
10-20	1
20-30	1
30-40	3
40-50	4
50-60	5
60-70	4
70-80	5
80-90	3
90-100	4
Total	30

Answer:

'Less Than' Cumulative Frequency Table

This table shows the number of students who scored less than the upper limit of each class interval.

Marks	Cumulative Frequency (No. of Students)
Less than 20	1
Less than 30	$1 + 1 = 2$
Less than 40	$2 + 3 = 5$
Less than 50	$5 + 4 = 9$
Less than 60	$9 + 5 = 14$
Less than 70	$14 + 4 = 18$
Less than 80	$18 + 5 = 23$
Less than 90	$23 + 3 = 26$
Less than 100	$26 + 4 = 30$

'More Than' Cumulative Frequency Table

This table shows the number of students who scored more than or equal to the lower limit of each class interval.

Marks	Cumulative Frequency (No. of Students)
More than or equal to 10	30
More than or equal to 20	$30 - 1 = 29$
More than or equal to 30	$29 - 1 = 28$
More than or equal to 40	$28 - 3 = 25$
More than or equal to 50	$25 - 4 = 21$
More than or equal to 60	$21 - 5 = 16$
More than or equal to 70	$16 - 4 = 12$
More than or equal to 80	$12 - 5 = 7$
More than or equal to 90	$7 - 3 = 4$

Graphic Representation of Data - Bar Graph, Histogram & Frequency Polygon

Presenting data in tables, such as frequency distribution tables, helps in organising and summarising it. However, visual representation of data through graphs makes it much easier to understand patterns, compare frequencies, and grasp the overall distribution quickly. Common graphical representations used for different types of data include bar graphs, histograms, and frequency polygons.

Bar Graph:

A bar graph is a visual representation used primarily for displaying ungrouped data or data related to discrete variables or categories. In a bar graph, rectangular bars of uniform width are drawn with equal spacing between them. The height of each bar is directly proportional to the frequency or value of the corresponding observation or category.

Features of a Bar Graph:

Used for representing discrete data or data categorised into distinct groups.
Rectangular bars have uniform width.
There is a constant and equal space maintained between consecutive bars.
Bars can be drawn either vertically (commonly) or horizontally.
One axis (usually the x-axis) represents the categories or observations.
The other axis (usually the y-axis) represents the frequency or value.
The height (or length, if horizontal) of each bar is proportional to the frequency/value it represents.

Example 1. The following table shows the monthly expenditure of a family on various items. Represent this data using a bar graph.

Head (Item)	Expenditure (in ₹)
House Rent	3000
Food	3400
Education	800
Electricity	400
Transport	600
Miscellaneous	1200

Answer:

Given:

A table showing the monthly expenditure of a family on different items.

To Construct:

A bar graph representing the given data.

Solution:

We will represent the different items (heads) on the horizontal axis (X-axis) and the corresponding expenditure (in ₹) on the vertical axis (Y-axis).

Steps of Construction:

Draw two perpendicular lines: a horizontal X-axis and a vertical Y-axis.
Along the X-axis, mark the different expenditure heads (House Rent, Food, Education, etc.), keeping equal spacing between them.
Along the Y-axis, choose a suitable scale to represent the expenditure. The maximum value is 3400. Let's choose a scale where 1 unit = ₹500. Mark the points 0, 500, 1000, 1500, ..., up to 3500.
For each item on the X-axis, draw a rectangular bar of uniform width. The height of the bar should correspond to the expenditure value on the Y-axis according to the chosen scale.
For example, for 'House Rent', the expenditure is ₹3000, so the bar will be drawn up to the 3000 mark on the Y-axis. For 'Food', the expenditure is ₹3400, so the bar will be drawn up to 3400 (just below 3500).

The resulting bar graph will look like this:

A bar graph showing the monthly expenditure of a family. The X-axis has items like House Rent, Food, etc. The Y-axis shows expenditure in rupees. The height of each bar corresponds to the amount spent.

Histogram:

A histogram is a graphical representation used specifically for displaying grouped data with continuous class intervals. In a histogram, rectangular bars are drawn such that the base of each bar represents a class interval, and the height of the bar represents the frequency of that class interval. Unlike bar graphs, there is no gap between consecutive bars because the class intervals are continuous.

Features of a Histogram:

Used for representing continuous grouped data.
Rectangular bars are drawn adjacent to each other without any gaps between them.
The base of each bar corresponds to a class interval.
The x-axis represents the class intervals (using the class limits).
The y-axis represents the frequency.
The height of each bar is proportional to the frequency of the corresponding class interval (assuming uniform class width). If class widths vary, the height is adjusted to make the area proportional to frequency (by calculating frequency density = frequency / class width).

If the first class interval does not start from zero (the origin), a kink or zig-zag line is usually shown on the x-axis near the origin to indicate a break in the scale.

Example 2. Draw a histogram for the frequency distribution of marks obtained by 30 students, as given in the table below.

Marks (Class Interval)	Frequency (No. of Students)
10-20	1
20-30	1
30-40	3
40-50	4
50-60	5
60-70	4
70-80	5
80-90	3
90-100	4

Answer:

Given:

A grouped frequency distribution table of marks.

To Construct:

A histogram for the given data.

Solution:

The class intervals are continuous (10-20, 20-30, etc.), so we can proceed with drawing the histogram directly.

Steps of Construction:

Draw a horizontal X-axis and a vertical Y-axis.
On the X-axis, represent the class intervals (marks). Since the data starts from 10 and not 0, we will put a kink or jagged line near the origin to show a break in the scale. Then, mark the class limits: 10, 20, 30, ..., 100 at equal distances.
On the Y-axis, represent the frequency (number of students). The maximum frequency is 5, so we can choose a scale like 1 unit = 1 student and mark points 1, 2, 3, 4, 5.
For each class interval, draw a rectangular bar with the class interval as its base and the corresponding frequency as its height.
For example, for the class interval 10-20, the base is from 10 to 20 on the X-axis, and the height is 1 on the Y-axis. For 20-30, the base is from 20 to 30, and the height is 1. This continues for all classes.
Since the data is continuous, the bars will be adjacent with no gaps in between.

The resulting histogram is shown below:

A histogram showing the distribution of student marks. The X-axis represents marks in class intervals (10-20, 20-30, etc.) and has a kink at the beginning. The Y-axis represents the number of students. The bars are adjacent to each other.

Frequency Polygon:

A frequency polygon is a line graph used to represent grouped data. It can be constructed in two ways: by joining the midpoints of the top sides of the rectangles in a histogram, or by plotting points corresponding to the class marks and their frequencies and connecting them.

Features of a Frequency Polygon:

Used for representing grouped data.
It is typically a closed polygon.
Points are plotted at the class marks (midpoints of the class intervals) on the x-axis and their corresponding frequencies on the y-axis.
To close the polygon and bring it down to the x-axis, imaginary class intervals with zero frequency are added at the beginning and at the end of the distribution. The class mark of the imaginary class before the first class is one class width less than the class mark of the first class. The class mark of the imaginary class after the last class is one class width more than the class mark of the last class.

How to Draw a Frequency Polygon:

Method 1 (Using Histogram):
1. Draw the histogram for the given grouped frequency distribution.
2. Mark the midpoint of the top side of each rectangle in the histogram.
3. Mark the midpoint of the imaginary class interval before the first class (on the x-axis, centered one class width before the first bar) and the midpoint of the imaginary class interval after the last class (on the x-axis, centered one class width after the last bar).
4. Join all these midpoints with straight line segments in order. This forms the frequency polygon.
Method 2 (Without Histogram):
1. Calculate the class mark for each class interval in the grouped frequency distribution table.
2. Calculate the class marks for the two imaginary class intervals with zero frequency (one before the first class and one after the last class).
3. Plot the points on a graph paper with class marks on the x-axis and corresponding frequencies (including 0 for imaginary classes) on the y-axis.
4. Join these plotted points with straight line segments in order. This forms the frequency polygon.

Example 3. Draw a frequency polygon for the data in Example 2 by first drawing a histogram.

Answer:

Given:

The frequency distribution of marks from Example 2.

To Construct:

A frequency polygon using a histogram.

Solution:

Step 1: Draw the histogram.

We first draw the histogram for the given data as done in Example 2.

Step 2: Find and mark the midpoints.

We calculate the class mark (midpoint) for each class interval and mark this point on the top of each corresponding bar.

10-20: Midpoint = 15
20-30: Midpoint = 25
30-40: Midpoint = 35
40-50: Midpoint = 45
50-60: Midpoint = 55
60-70: Midpoint = 65
70-80: Midpoint = 75
80-90: Midpoint = 85
90-100: Midpoint = 95

Step 3: Add imaginary classes to close the polygon.

We add an imaginary class before the first class (0-10) and one after the last class (100-110). Their frequencies are 0. Their midpoints are 5 and 105, respectively.

Step 4: Connect the points.

Join the midpoints sequentially with straight line segments, starting from the midpoint of the first imaginary class (5,0), going through the midpoints of all the bars, and ending at the midpoint of the last imaginary class (105,0).

A histogram with a frequency polygon drawn over it. The polygon connects the midpoints of the tops of the bars and is anchored to the x-axis.

Example 4. Draw a frequency polygon for the data given below without drawing a histogram.

Cost of living index	Number of weeks
140 - 150	5
150 - 160	10
160 - 170	20
170 - 180	9
180 - 190	6
190 - 200	2

Answer:

Given:

A grouped frequency table of the cost of living index.

To Construct:

A frequency polygon without a histogram.

Solution:

Step 1: Calculate the class marks.

We need to find the midpoint (class mark) of each class interval. We also need to find the class marks for the imaginary classes before and after the given data range to anchor the polygon to the x-axis.

Class Interval	Class Mark	Frequency
130 - 140	135	0
140 - 150	145	5
150 - 160	155	10
160 - 170	165	20
170 - 180	175	9
180 - 190	185	6
190 - 200	195	2
200 - 210	205	0

Step 2: Plot the points.

We will now plot the points (Class Mark, Frequency) on a graph. The X-axis will represent the Class Marks (Cost of living index) and the Y-axis will represent the Frequency (Number of weeks).

The points to be plotted are: (135, 0), (145, 5), (155, 10), (165, 20), (175, 9), (185, 6), (195, 2), and (205, 0).

Step 3: Connect the points.

Join the plotted points in order using straight line segments to form the frequency polygon.

A frequency polygon showing the distribution of the cost of living index. The X-axis represents class marks and the Y-axis represents the number of weeks. The polygon is formed by connecting plotted points.

Measures of Central Tendency of Ungrouped Data

After collecting and organising data, the next step in statistical analysis is often to summarise the data using a single value that represents the typical or central characteristic of the dataset. These values are called Measures of Central Tendency. They provide a summary of the location or centre around which the data points cluster.

The three most commonly used measures of central tendency for ungrouped data (data that is not grouped into class intervals) are the Mean, Median, and Mode.

Mean (Arithmetic Mean):

The mean, also known as the arithmetic mean or average, is the most commonly used measure of central tendency. It is calculated by summing all the observations in a dataset and then dividing the sum by the total number of observations.

If we have $n$ observations in a dataset, denoted by $x_1, x_2, ..., x_n$, the mean (represented by the symbol $\overline{x}$, read as 'x-bar') is given by the formula:

$\overline{x} = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$

Using summation notation, where $\sum\limits$ (sigma) represents the sum:

$\overline{x} = \frac{x_1 + x_2 + ... + x_n}{n}$

$\overline{x} = \frac{\sum\limits_{i=1}^{n} x_i}{n}$

Here, $\sum\limits_{i=1}^{n} x_i$ means the sum of all observations from $i=1$ to $n$.

Feature: The mean is affected by every observation in the dataset, including extreme values (outliers), which can pull the mean towards them.

Median:

The median is the middle value of a dataset when the data is arranged in order of magnitude (either ascending or descending). It is the value that separates the higher half of the data from the lower half.

How to find the Median for Ungrouped Data:

Arrange the Data: Arrange all the observations in the dataset in either ascending order (from smallest to largest) or descending order (from largest to smallest).
Count Observations: Determine the total number of observations in the dataset, denoted by $n$.
Find the Middle Position(s):
- If $n$ is an odd number, there is a single middle observation. The position of the median is given by $\left(\frac{n+1}{2}\right)^{\text{th}}$. The median is the observation found at this position in the ordered data.
- If $n$ is an even number, there are two middle observations. The positions of these observations are $\left(\frac{n}{2}\right)^{\text{th}}$ and $\left(\frac{n}{2} + 1\right)^{\text{th}}$. The median is the average (arithmetic mean) of these two middle observations.

If $n$ is odd: Median $= \text{Value of the } \left(\frac{n+1}{2}\right)^{\text{th}} \text{ observation in ordered data}$

If $n$ is even: Median $= \frac{\left(\frac{n}{2}\right)^{\text{th}} \text{ observation} + \left(\frac{n}{2} + 1\right)^{\text{th}} \text{ observation}}{2} \text{ (in ordered data)}$

Feature: The median is a positional average. It is not affected by extreme values, making it a useful measure when the data contains outliers.

Mode:

The mode is the observation that appears most frequently in a dataset. It represents the most common value in the data.

How to find the Mode for Ungrouped Data:

Count Frequencies: Count how many times each distinct observation appears in the dataset.
Identify Highest Frequency: Find the observation(s) that have the highest frequency.
Determine the Mode: The observation(s) with the highest frequency is/are the mode(s).

A dataset can have:

One mode (unimodal).
Two modes with the same highest frequency (bimodal).
More than two modes with the same highest frequency (multimodal).
No mode, if all observations have the same frequency, or if no value repeats.

Feature: The mode is useful for identifying the most typical category or value, and it can be used for both numerical and categorical (non-numerical) data.

Example 1. Find the mean, median, and mode of the following data: 15, 18, 16, 15, 17, 15, 12, 18, 16, 15.

Answer:

Given:

The dataset: 15, 18, 16, 15, 17, 15, 12, 18, 16, 15.

Total number of observations, $n = 10$.

To Find:

Mean, Median, and Mode.

Solution:

Mean:

To find the mean, sum all the observations and divide by the total number of observations ($n=10$).

Sum of observations = $15 + 18 + 16 + 15 + 17 + 15 + 12 + 18 + 16 + 15$

Sum of observations = 157

... (1)

$\overline{x} = \frac{\text{Sum of observations}}{n}$

Substitute the sum from equation (1) and $n=10$:

$\overline{x} = \frac{157}{10}$

$\overline{x} = 15.7$

... (2)

The mean of the data is 15.7.

Median:

To find the median, first arrange the data in ascending order:

Ordered Data: 12, 15, 15, 15, 15, 16, 16, 17, 18, 18.

The number of observations is $n = 10$, which is an even number.

For an even number of observations, the median is the average of the two middle observations. The positions of the middle observations are $\frac{n}{2}$ and $\frac{n}{2} + 1$.

Position of first middle observation $= \frac{10}{2} = 5^{\text{th}}$

Position of second middle observation $= \frac{10}{2} + 1 = 5 + 1 = 6^{\text{th}}$

From the ordered data (12, 15, 15, 15, 15, 16, 16, 17, 18, 18):

5th observation = 15

... (3)

6th observation = 16

... (4)

Median = Average of 5th and 6th observations:

Median $= \frac{15^{\text{th}} \text{ observation} + 6^{\text{th}} \text{ observation}}{2}$

Substitute the values from equations (3) and (4):

Median $= \frac{15 + 16}{2}$

Median $= \frac{31}{2} = 15.5$

... (5)

The median of the data is 15.5.

Mode:

To find the mode, identify the observation that occurs most frequently in the dataset.

Let's list the distinct observations and their frequencies:

12: occurs 1 time.
15: occurs 4 times.
16: occurs 2 times.
17: occurs 1 time.
18: occurs 2 times.

The observation with the highest frequency is 15, which occurs 4 times.

Mode = 15

... (6)

The mode of the data is 15.

Relationship Between Mean, Median and Mode

The mean, median, and mode are all measures of central tendency, but they represent the "centre" of the data in different ways. The relationship between these three measures depends on the shape of the data distribution.

Symmetrical Distribution:

For a perfectly symmetrical distribution (like the normal distribution or a bell curve), the data is evenly spread around the center. In a symmetrical distribution, the mean, median, and mode all coincide (are approximately equal to each other).

Mean $\approx$ Median $\approx$ Mode

Skewed Distribution:

For skewed distributions, the data is not evenly spread. There is a tail extending towards one end.

Positively Skewed Distribution (Skewed to the Right): The tail is longer on the right side. The mode is at the peak, followed by the median, and the mean is pulled towards the longer tail (mean > median > mode).

Negatively Skewed Distribution (Skewed to the Left): The tail is longer on the left side. The mode is at the peak, followed by the median, and the mean is pulled towards the longer tail (mean < median < mode).

Empirical Relationship:

While the exact relationship depends on the specific data distribution, for moderately skewed distributions, there exists an empirical relationship that provides a good approximation of the relationship between the mean, median, and mode. This relationship is given by:

$\text{Mode} \approx 3 \times \text{Median} - 2 \times \text{Mean}$

This is not a strict mathematical formula derived from definitions, but an observed relationship that holds approximately true for a wide range of distributions. It can be used to estimate one of the measures if the other two are known.

Example 1. If the mean and median of a data set are 25 and 26 respectively, estimate the mode using the empirical formula.

Answer:

Given:

Mean ($\overline{x}$) = 25.

Median = 26.

To Estimate:

The mode using the empirical formula.

Solution:

Using the empirical relationship between mean, median, and mode:

Mode $\approx 3 \times \text{Median} - 2 \times \text{Mean}$

Substitute the given values of Mean and Median:

Mode $\approx 3 \times (26) - 2 \times (25)$

Perform the multiplications:

Mode $\approx 78 - 50$

Perform the subtraction:

Mode $\approx 28$

... (1)

The estimated value of the mode is 28.

Problems on Arithmetic Mean

The arithmetic mean is a key measure of central tendency. For ungrouped data, calculating the mean is straightforward using the definition: summing all observations and dividing by the total count. This section provides examples illustrating the calculation and application of the arithmetic mean for ungrouped datasets.

Formula for Arithmetic Mean (Ungrouped Data):

If we have a dataset consisting of $n$ observations, denoted as $x_1, x_2, x_3, ..., x_n$, the arithmetic mean ($\overline{x}$) is calculated as:

$\overline{x} = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$

Using the summation notation, this can be written compactly as:

$\overline{x} = \frac{x_1 + x_2 + ... + x_n}{n}$

$\overline{x} = \frac{\sum\limits_{i=1}^{n} x_i}{n}$

Here, $\sum\limits_{i=1}^{n} x_i$ means the sum of all the observations from the first observation ($x_1$) to the $n$-th observation ($x_n$).

Example 1. Calculate the mean of the first five prime numbers.

Answer:

Given:

Requirement to calculate the mean of the first five prime numbers.

To Find:

The arithmetic mean.

Solution:

First, identify the first five prime numbers. A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.

The first five prime numbers are 2, 3, 5, 7, and 11.

The total number of observations ($n$) is 5.

Next, find the sum of these observations:

Sum of observations = $2 + 3 + 5 + 7 + 11$

Sum of observations = $28$

... (1)

Now, use the formula for the mean:

$\overline{x} = \frac{\text{Sum of observations}}{n}$

Substitute the sum from equation (1) and $n=5$:

$\overline{x} = \frac{28}{5}$

Perform the division:

$\overline{x} = 5.6$

... (2)

The mean of the first five prime numbers is 5.6.

Example 2. The mean of 5 observations is 10. If four of the observations are 8, 12, 10, and 11, find the fifth observation.

Answer:

Given:

Mean of 5 observations = 10.

Four of the observations are 8, 12, 10, and 11.

To Find:

The value of the fifth observation.

Solution:

Let the five observations be $x_1, x_2, x_3, x_4, x_5$. We are given that the total number of observations ($n$) is 5 and the mean ($\overline{x}$) is 10.

The formula for the mean is $\overline{x} = \frac{\text{Sum of observations}}{n}$.

Substituting the given values into the formula:

$10 = \frac{x_1 + x_2 + x_3 + x_4 + x_5}{5}$

To find the sum of the observations, multiply the mean by the number of observations:

Sum of observations $= \overline{x} \times n = 10 \times 5$

Sum of observations $= 50$

... (1)

We are given the values of four of the observations: 8, 12, 10, and 11. Let the unknown fifth observation be represented by the variable $y$.

The sum of the five observations is the sum of the four known observations plus the unknown observation $y$:

Sum of observations $= 8 + 12 + 10 + 11 + y$

Sum of observations $= 41 + y$

... (2)

Equating the two expressions for the sum of observations from equations (1) and (2):

$41 + y = 50$

To find the value of $y$, subtract 41 from both sides of the equation:

$y = 50 - 41$

$y = 9$

... (3)

The fifth observation is 9.

We can verify this by calculating the mean of the five observations 8, 12, 10, 11, and 9: $\frac{8+12+10+11+9}{5} = \frac{50}{5} = 10$, which matches the given mean.

Example 3. A student's marks in 5 subjects are 75, 80, 65, 90, 70. What is the student's average mark?

Answer:

Given:

Marks in 5 subjects: 75, 80, 65, 90, 70.

Number of subjects (observations), $n = 5$.

To Find:

The student's average mark (arithmetic mean).

Solution:

To find the average mark, we calculate the sum of the marks in all subjects and divide by the number of subjects.

Sum of marks = $75 + 80 + 65 + 90 + 70$

Let's add the marks:

Sum of marks = 380

... (1)

Using the formula for the mean:

$\overline{x} = \frac{\text{Sum of marks}}{n}$

Substitute the sum from equation (1) and $n=5$:

$\overline{x} = \frac{380}{5}$

Perform the division:

$\overline{x} = 76$

... (2)

The student's average mark is 76.

10	20	36	92	95	40	50	56	60	70
92	88	80	70	72	70	36	40	36	40
92	40	50	50	56	60	70	60	60	88

10	20	36	92	95	40	50	56	60	70
92	88	80	70	72	70	36	40	36	40
92	40	50	50	56	60	70	60	60	88

10	20	36	92	95	40	50	56	60	70
92	88	80	70	72	70	36	40	36	40
92	40	50	50	56	60	70	60	60	88